A common genetic variation in GZMB may associate with cancer risk in patients with Lynch syndrome

Lynch syndrome (LS), also known as hereditary nonpolyposis colorectal cancer syndrome (HNPCC) is a common genetic predisposition to cancer due to germline mutations in genes affecting DNA mismatch repair. Due to mismatch repair deficiency, developing tumors are characterized by microsatellite instability (MSI-H), high frequency of expressed neoantigens and good clinical response to immune checkpoint inhibitors. Granzyme B (GrB) is the most abundant serine protease in the granules of cytotoxic T-cells and natural killer cells, mediating anti-tumor immunity. However, recent results confirm a diverse range of physiological functions of GrB including that in extracellular matrix remodelling, inflammation and fibrosis. In the present study, our aim was to investigate whether a frequent genetic variation of GZMB, the gene encoding GrB, constituted by three missense single nucleotide polymorphisms (rs2236338, rs11539752 and rs8192917) has any association with cancer risk in individuals with LS. In silico analysis and genotype calls from whole exome sequencing data in the Hungarian population confirmed that these SNPs are closely linked. Genotyping results of rs8192917 on a cohort of 145 individuals with LS demonstrated an association of the CC genotype with lower cancer risk. In silico prediction proposed likely GrB cleavage sites in a high proportion of shared neontigens in MSI-H tumors. Our results propose the CC genotype of rs8192917 as a potential disease-modifying genetic factor in LS.

Lynch syndrome (LS), also known as hereditary nonpolyposis colorectal cancer syndrome (HNPCC) is a common genetic predisposition to cancer due to germline mutations in genes affecting DNA mismatch repair. Due to mismatch repair deficiency, developing tumors are characterized by microsatellite instability (MSI-H), high frequency of expressed neoantigens and good clinical response to immune checkpoint inhibitors. Granzyme B (GrB) is the most abundant serine protease in the granules of cytotoxic T-cells and natural killer cells, mediating anti-tumor immunity. However, recent results confirm a diverse range of physiological functions of GrB including that in extracellular matrix remodelling, inflammation and fibrosis. In the present study, our aim was to investigate whether a frequent genetic variation of GZMB, the gene encoding GrB, constituted by three missense single nucleotide polymorphisms (rs2236338, rs11539752 and rs8192917) has any association with cancer risk in individuals with LS. In silico analysis and genotype calls from whole exome sequencing data in the Hungarian population confirmed that these SNPs are closely linked. Genotyping results of rs8192917 on a cohort of 145 individuals with LS demonstrated an association of the CC genotype with lower cancer risk. In silico prediction proposed likely GrB cleavage sites in a high proportion of shared neontigens in MSI-H tumors. Our results propose the CC genotype of rs8192917 as a potential disease-modifying genetic factor in LS. KEYWORDS granzyme B, Lynch syndrome, immunotherapy, microsatellite instability, colorectal cancer, mismatch repair deficiency, cancer neoantigens, immune infiltration Introduction Cancer neoantigens are de novo amino acid sequences produced by cancer cells provoking antitumor immune response (1). As multiple layers of evidence support the clinically effective modification of the host's immune system in fighting neoplastic diseases mainly through pharmacologic inhibition of immune checkpoints (2,3), cutting-edge approaches in oncoimmunology aim to stimulate immune responses by cancer vaccines (4) and in vitro modification of effector T cells (5). Although most cancer neoantigens are unique to one's cancer, several cancers, especially those characterized by microsatellite instability (MSI-H) share several recurrent neoantigens (6,7), providing a rationale for off-the-shelf cancer vaccines not only in the adjuvant/metastatic setting (8), but also as a part of primary prevention in high-risk individuals (9,10).
Although MSI-H cancers are sporadic tumors in the majority of the cases, Lynch syndrome (LS), a frequent cancer predisposition syndrome with a prevalence of 1:250-500 significantly elevates the risk of developing MSI-H neoplasms (11). LS is caused by germline pathogenic mutations in genes disrupting the optimal function of the DNA mismatch repair machinery (MLH1, MSH2, MSH6, PMS2, EPCAM), and is inherited in an autosomal dominant manner. Frequent manifestations include colorectal cancer (CRC) and endometrial cancer, although it mildly elevates the risk of a spectrum of malignancies (12). Specific cancer risks vary by the gene concerned; mutations in MLH1, MSH2 and EPCAM result in similarly higher risks for CRC (~50% lifetime risk) and endometrial cancer (~50% lifetime risk) (12). Previous studies have demonstrated high frequencies of effective immunity against the shared neoantigens formerly described in MSI-H tumors not only in LS-associated cancer patients but also in healthy individuals living with LS (13,14). Optimal immunosurveillance acquires a key part in the host's prevention of carcinogenesis (15). In healthy patients with LS, normal colonic mucosa exhibits higher frequencies of a wide range of immune cell populations, while this is reduced in LS-associated cancer (16). In colonic premalignant lesions, a higher amount of lymphocyte -activation gene 3 immune checkpoint expression facilitates immune evasion and carcinogenesis (17). Immune-tumor interactions, and immunoediting in particular, are dynamically changing after malignant transformation, the most important of which is the frequent loss of b 2 microglobulin expression, resulting in impaired HLA class I antigen expression (15,(18)(19)(20).
Granzymes are serine proteases, which are the main components of the granules of cytotoxic T-cells and natural killer (NK) cells eliciting perforin-mediated apoptosis of target cells (21). This mechanism is a key effector element in antimicrobial and antitumor immune responses. Granzyme B (GrB) is the most abundant granzyme present in cytotoxic granules, however, recent studies have uncovered several additional molecular mechanisms, in which GrB maintains significant roles (22). In particular, by the expression of GrB in a wide variety of normal epithelial cells and cancer cells, GrB alters extracellular matrix remodeling, epithelialto-mesenchymal transition and fibrosis (22).
GrB is a 33 kDa protein, which is encoded by the gene GZMB in humans. Former genetic studies have confirmed three closely linked common single nucleotide polymorphisms (SNPs) all resulting in missense mutations (rs8192917 Q48R, rs11539752 P88A and rs2236338 Y245H) ( Figure 1A) (26,27). An initial study proposed that Granzyme B harbouring the minor RAH haplotype (resulting from the minor alleles of these SNPs) is incapable of inducing apoptosis (26), however a follow-up study leveraging several layers of evidence including enzyme activity assays dismissed this possibility (27). Nevertheless, there is still no clear evidence on how these three missense alterations, relatively far from the catalytic triad might affect the function of the enzyme.
In the present study, we aimed to analyze the association between rs8192917, a tagging SNP of the RAH haplotype and cancer risk in high-risk individuals with LS. Additionally, based on our in silico analysis we anticipate that a significant portion of shared neoantigens in MSI-H cancers can be cleaved by GrB. This study strengthens the immune-related pathogenetic contribution to LS-associated tumorigenesis and invites further investigations in independent LS cohorts to validate the observed disease-modifying nature of rs8192917.

Materials and methods
Subjects, DNA isolation and genotyping 145 individuals harboring germline pathogenic mutations of MLH1, MSH2 or terminal deletions of EPCAM were consecutively enrolled between 1994 and 2021. Baseline characteristics of study subjects are presented in Table 1. Following genetic counseling and written informed consent, DNA was isolated from peripheral blood using the Gentra Puregene Blood Kit (Qiagen, Cat No.: 158389). Mutation analysis was performed using methods described earlier (28,29) including Sanger sequencing and multiplex ligationdependent probe amplification (MLPA). The study was approved by the Scientific and Research Ethics Committee of the Medical Research Council of Hungary (ETT-TUKEB 53720-7/2019/EÜIG).
For the genotyping of rs8192917, a predesigned TaqMan Allelic Discrimination assay was used (assay ID: C:_2815152_20). Quantitative real-time PCR was performed on an Applied Biosystems 7500 Fast PCR instrument according to the manufacturer's instructions.

Linkage disequilibrium analysis of rs2236338, rs11539752 and rs8192917
To investigate if a tagging SNP from rs2236338, rs11539752 and rs8192917 can be selected we analyzed the linkage disequilibrium of these SNPs. Firstly, LDMatrix online tool (LDLink, version 5.3.3) was used to assess the linkage disequilibrium of these SNPs in various populations (30).
Additionally, to investigate linkage disequilibrium in the same population from which our LS cohort was selected, we leveraged whole exome sequencing (WES) data from our center obtained between 2017-2022. Patients included in this cohort were advised to have their germline DNA subjected to clinical WES based on their individual and family cancer history and the lack of germline pathogenic mutation detection in prior targeted genetic testing. As patients throughout the entire country were included in this database following the recommendations of national centers of excellence, this database can be considered as representative of the current Hungarian population.
WES data of 276 independent individuals were included in this anonymized Hungarian germline sequencing database. WES was performed using standard methods. Sequencing data analysis was performed following the Genome Analysis Toolkit (GATK) best practices guidelines. Paired-end sequencing data were obtained in FASTQ file format and reads were trimmed using Cutadapt to remove adapters and bases where the PHRED quality value was less than 20. The trimmed reads were aligned to Genome Reference Consortium Human Build hg19 using Burrows-Wheeler Aligner (bwa-mem2-2.0) (31). Picard tools were used to sort, marking duplicates and index reads. Base Quality Score Recalibration (BQSR) was performed using (GATK) (32). Variant discovery was performed in two steps: single-sample variant calling was performed using HaplotypeCaller in GATK; this was followed by GenotypeGVCFs to combine variants from single-sample gVCFs to the multi-sample VCF. Variant annotations were executed using Funcotator. Genotype calls of rs2236338, rs11539752 and rs8192917 were performed as earlier reported (33). Briefly, allele ratios of 0-0.1 and 0.90-1 were considered homozygous, 0.3-0.7 were considered heterozygous, while allele ratios of 0.1-0.3 and 0.7-0.9 were uncalled. The minimum sequencing depth for homozygous calls was 5 reads/allele, and for heterozygous calls was 10 reads/ allele. Genotype calls were further analyzed using Haploview 4.2 software (34).

In silico analysis of Granzyme B cleavage sites in shared frameshift neoantigens
The ability of Granzyme B to cleave shared neoantigens present in MSI-H tumors was analyzed by the PROSPERous online tool (35). Briefly, shared neoantigens which are developed as frameshift peptides due to repeated mismatch repair deficiency provoking an immune response in independent patients with MSI-H tumors were selected based on the prior study of Ruodko et al. (7). The amino acid sequences of the selected neoantigens were subjected to the PROSPERous prediction algorithm using the P4-P2' cleavage site and logistic regression options.

Statistical analysis
Statistical analysis was performed using GraphPad Prism 9.0 software. For analyzing the correlation between cancer occurrence and SNP status log-rank test, c 2 and Fisher's exact tests were used. P-values < 0.05 were considered statistically significant. Post-hoc power calculations were performed on significant differences observed by Fisher's exact test to assess the role of sample size. Statistical power > 80% was considered to be strong statistical power.

Results
SNPs rs2236338, rs11539752 and rs8192917 are in linkage disequilibrium in the investigated Central European population We performed a linkage disequilibrium analysis of SNPs rs2236338, rs11539752 and rs8192917. First, we analyzed an anonymized exome sequencing database of Hungarian patients with various medical histories. This cohort includes patients from the whole country and therefore is representative of the current Hungarian population. Within this cohort, the three SNPs were found to be in linkage disequilibrium ( Figure 1B). Additional in silico analysis in the LDmatrix database verified the universal nature of this association in various populations ( Figures 1C, D).
Since the results indicated an extremely strong linkage disequilibrium between the analyzed three SNPs, we selected rs8192917 as the tagging SNP for this haplotype for further analysis based on the fact that extensive previous research initiatives analyzed its' association with various conditions and diseases (36-42).
As expected, the most frequent manifestation of LS was CRC followed by endometrial cancer in women, with more than 75% of the cohort exhibiting any manifestation of LS (Table 1). This LSassociated cancer risk corresponds well with data from the PLSD (70-80%), while our cohort contains a larger percentage of CRC cases (67% vs. 50%) possibly due to referral bias (12).
Genotyping of rs8192917 in our LS cohort revealed 85 individuals with TT, 49 individuals with CT and 11 individuals with CC genotype resulting in a minor allele frequency (MAF) of 0.24. This MAF is similar to the MAF of 0.23 observed in the European (non-Finnish) population of the gnomAD database (version 2.1.1) (43).
LS individuals harboring the CC genotype were less likely to develop CRC, the main manifestation of the syndrome (Figures 2A,  B, Table 2A). Moreover, individuals homozygous for the minor allele 'C' were less likely to develop LS-associated tumor manifestations ( Figures 2C, D, Table 2B). Since LS-associated tumor risks differ in women vs. men mainly due to the risk of endometrial cancer, we performed subgroup analyses based on sex as well (Supplementary Figure 1, Tables 2C, D) and have found that rs8192917-dependent differences are maintained in the female cohort. Statistical power calculations revealed the associations found in the total cohort to be mild, however statistical power was found to be strong when analyzing LS-associated tumor occurrence restricted to women ( Table 2). Additional calculations restricted to LS-associated tumor manifestations without CRC or restricted to endometrial cancer occurrence in women have found no associations, probably due to the low sample sizes in these subgroups (Supplementary Table 1). If multiple cancers in the same organ occurred in the same patient, the age at the first occurrence was included. Data are presented as n (% of total/female subjects) or mean ± standard deviation.

Granzyme B is predicted to cleave multiple shared neoantigens in MSI-H cancers
Based on the study of Roudko et al, 46 neoantigens shared between MSI-H tumors were included in the in silico analysis (7) to assess probable cleavage by Granzyme B (35). Upon the scores of each cleavage sites, putative loci were classified as likely (score>5.0, n = 27) and unlikely (score<1.0, n = 1575) cleavage sites ( Figure 3A). Applying this threshold, 19 out of 46 (41.3%) shared neoantigens harboured likely cleavage sites ( Figure 3B, Supplementary Table 2).

Discussion
Identifying disease-modifying genetic factors has tremendous potential in personalizing screening protocols and treatment options, especially in individuals living with high risk for cancer development. Primary studies in individuals living with hereditary predisposition to breast and ovarian cancer found multiple SNPs which may alter cancer risk (44,45). Based on these results recent studies started to investigate the predictive ability of polygenic risk scores for cancer risk prediction in women living with germline mutations in BRCA1 and BRCA2 (46,47). In LS, earlier studies have found some SNPs which can affect personalized risk of cancer (48)(49)(50)(51). In particular, two studies confirmed the association of genomic regions 8q23.3 and 11q23.1 with CRC risk in individuals with LS (48,51). A recent study found an association between rs2075786, a SNP of the gene encoding telomerase reverse transcriptase (TERT) and cancer risk in individuals with germline MSH2 mutation in a large international cohort (50). Nevertheless, studies leveraging these results to construct polygenic risk scores in this cohort are still scarce. However, a pioneering study in individuals with germline PMS2 mutations provided a rationale to investigate this matter more thoroughly (49). In anticipation of these studies we aimed to investigate if immunogenetic factors, specifically genetic variants of GrB might associate with cancer risk in LS.
Rs2236338, rs1159752 and rs8192917 are three functionally active SNPs resulting in amino acid change. Jeong et al. previously showed that these three SNPs constitute a haploblock in the Korean population (36). Similarly to this result, by the analysis of the Central European population in the LDMatrix online tool and also in our Hungarian cohort of 276 independent patients, where WES was performed, we found that these three SNPs are closely linked together. Based on these results we selected rs8192917 as a tagging SNP of this haplotype for further genotyping in our LS cohort.
The CC genotype of rs8192917 was associated with decreased risk for LS-associated tumors in our cohort. This association was significant when the analysis was restricted to females, while in males a tendency toward this association was verified. The association of the CC genotype with decreased cancer occurrence was also validated when the analysis was restricted to CRC, but not to endometrial cancer and other, less frequent LS manifestations, possibly due to the insufficient sample size in this regard. Earlier, the 'C' allele of rs8192917 has been associated with vitiligo, an autoimmune dermatologic condition in three independent cohorts (36-38). Moreover, rs8192917 has also been associated with subacute sclerosing panencephalitis (39) and postoperative keloids (40) in single cohorts. Additionally, this SNP has been associated with transplantation outcomes after HLA-matched unrelated bone marrow transplantation (41). On the functional level, a study has suggested that rs8192917 associates with natural killer cell cytotoxicity (42), however further functional validation is needed to determine the specific nature of this finding. Nonetheless, another GZMB SNP, independent of rs8192917 has been associated with joint destruction in rheumatoid arthritis (52), strengthening the role of genetic variants of GZMB in immune-related pathogenic mechanisms. Although these previous lines of evidence suggest a wide-raging disease-modifying effect of GZMB genetic variants, mechanistic insights are still lacking. Sun et al. have clearly demonstrated that the minor RAH haplotype retains the pro-apoptotic activity of GrB (27), however, GrB has multiple roles in extracellular matrix remodeling, epithelial-to-mesenchymal transition, inflammation and fibrosis (22). QPY and RAH variants may have different substrate-specificities, which might directly affect some of these mechanisms. Regarding autoimmune pathomechanisms as in the case of generalized vitiligo, it has already been suggested that altered cleavage of autoantigens might explain the previously confirmed association of rs8192917 with this disease (38). Transposing this hypothesis to MSI-H carcinogenesis in LS, our in silico analysis revealed that a large proportion of shared neoantigens in MSI-H cancers encompass likely GrB cleavage sites. Differential cleavage of these frameshift peptides serving as neoantigens might directly affect optimal immunosurveillance, a key feature in LS-related tumorigenesis (15). Moreover, performing in-depth analyses regarding autoimmune conditions and cancer risk in individuals with LS might further shed light on the immunological pathomechanism of these diseases which can possibly be affected by GrB and other mediators. It is important to note the limitations of our study. Deriving from its' monocentric nature, we cannot infer the observed association to other populations. Although our cohort is smaller than those of large international consortia, such as the PLSD (12), it is still the largest Hungarian Lynch syndrome cohort ever studied and is comparable to a Swedish national study, where the country's population is also comparable to Hungary (53). By disregarding individuals with MSH6 and PMS2 mutations, where cancer risk is significantly lower, and selecting only individuals with MLH1, MSH2 or EPCAM mutations, where lifetime colorectal cancer risk is approximately 50%, we were able to perform our comparative study in a relatively homogeneous population.
In conclusion, we found that rs8192917 SNP of the gene encoding GrB correlates with cancer risk in our LS cohort. Following validation of this finding in independent cohorts, this SNP can be included in personalized risk stratification and screening recommendations in affected individuals. Further research avenues might also include the functional assessment of the QPY and RAH variants of GrB to investigate possible differences in substrate-specificities that might explain the observed protective effect.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: European Variation Archive, Project ID: PRJEB55654, Analysis ID: ERZ13336597.

Ethics statement
The studies involving human participants were reviewed and approved by Scientific and Research Ethics Committee of the Medical Research Council of Hungary. Written informed consent A B FIGURE 3 Predicted cleavage sites of Granzyme B in shared neoantigens in MSI-H tumors Representation of each predicted cleavage site score in decreasing order. Sites with the highest probability are located on the left side, sites with the lowest probability are located on the right side (Panel A). Heatmap representing the top 8 cleavage sites of each shared neoantigen and their corresponding scores (Panel B). On the top of the heatmap gene IDs from which each neoantigen originates are presented. Detailed results of each cleavage site prediction are presented in Supplementary Table 2. to participate in this study was provided by the participants' legal guardian/next of kin.